Classifying Languages by Dependency Structure. Typologies of Delexicalized Universal Dependency Treebanks
نویسندگان
چکیده
This paper shows how the current Universal Dependency treebanks can be used for clustering structural global linguistic features of the treebanks to reveal a purely structural syntactic typology of languages. Different uniand multi-dimensional data extraction methods are explored and tested in order to assess both the coherence of the underlying syntactic data and the quality of the clustering methods themselves.
منابع مشابه
Delexicalized and Minimally Supervised Parsing on Universal Dependencies
In this paper, we compare delexicalized transfer and minimally supervised parsing techniques on 32 different languages from Universal Dependencies treebank collection. The minimal supervision is in adding handcrafted universal grammatical rules for POS tags. The rules are incorporated into the unsupervised dependency parser in forms of external prior probabilities. We also experiment with learn...
متن کاملA Transition-based System for Universal Dependency Parsing
This paper describes the system for our participation of team Wanghao-ftd-SJTU in the CoNLL 2017 Shared Task: Multilingual Parsing from Raw Text to Universal Dependencies. In this work, we design a system based on UDPipe1 for universal dependency parsing, where transitionbased models are trained for different treebanks. Our system directly takes raw texts as input, performing several intermedia...
متن کاملA Representation Learning Framework for Multi-Source Transfer Parsing
Cross-lingual model transfer has been a promising approach for inducing dependency parsers for lowresource languages where annotated treebanks are not available. The major obstacles for the model transfer approach are two-fold: 1. Lexical features are not directly transferable across languages; 2. Target languagespecific syntactic structures are difficult to be recovered. To address these two c...
متن کاملAn annotation scheme for Persian based on Autonomous Phrases Theory and Universal Dependencies
A treebank is a corpus with linguistic annotations above the level of the parts of speech. During the first half of the present decade, three treebanks have been developed for Persian either originally or subsequently based on dependency grammar: Persian Treebank (PerTreeBank), Persian Syntactic Dependency Treebank, and Uppsala Persian Dependency Treebank (UPDT). The syntactic analysis of a sen...
متن کاملDelexicalized transfer parsing for low-resource languages using transformed and combined treebanks
This paper describes the IIT Kharagpur dependency parsing system in CoNLL2017 shared task on Multilingual Parsing from Raw Text to Universal Dependencies. We primarily focus on the lowresource languages (surprise languages). We have developed a framework to combine multiple treebanks to train parsers for low resource languages by a delexicalization method. We have applied transformation on the ...
متن کامل